10 research outputs found
Getting More out of Large Language Models for Proofs
Large language models have the potential to simplify formal theorem proving
and make it more accessible. But how to get the most out of these models is
still an open question. To answer this question, we take a step back and
explore the failure cases of these models using common prompting-based
techniques. Our talk will discuss these failure cases and what they can teach
us about how to get more out of these models
Ornaments for Proof Reuse in Coq
Ornaments express relations between inductive types with the same inductive structure. We implement fully automatic proof reuse for a particular class of ornaments in a Coq plugin, and show how such a tool can give programmers the rewards of using indexed inductive types while automating away many of the costs. The plugin works directly on Coq code; it is the first ornamentation tool for a non-embedded dependently typed language. It is also the first tool to automatically identify ornaments: To lift a function or proof, the user must provide only the source type, the destination type, and the source function or proof. In taking advantage of the mathematical properties of ornaments, our approach produces faster functions and smaller terms than a more general approach to proof reuse in Coq
Proof Repair Infrastructure for Supervised Models: Building a Large Proof Repair Dataset
We report on our efforts building a new, large proof-repair dataset and benchmark suite for the Coq proof assistant. The dataset is made up of Git commits from open-source projects with old and new versions of definitions and proofs aligned across commits. Building this dataset has been a significant undertaking, highlighting a number of challenges and gaps in existing infrastructure. We discuss these challenges and gaps, and we provide recommendations for how the proof assistant community can address them. Our hope is to make it easier to build datasets and benchmark suites so that machine-learning tools for proofs will move to target the tasks that matter most and do so equitably across proof assistants
Long-Term Mentoring for Computer Science Researchers
Early in the pandemic, we -- leaders in the research areas of programming
languages (PL) and computer architecture (CA) -- realized that we had a
problem: the only way to form new lasting connections in the community was to
already have lasting connections in the community. Both of our academic
communities had wonderful short-term mentoring programs to address this
problem, but it was clear that we needed long-term mentoring programs.
Those of us in CA approached this scientifically, making an evidence-backed
case for community-wide long-term mentoring. In the meantime, one of us in PL
had impulsively launched an unofficial long-term mentoring program, founded on
chaos and spreadsheets. In January 2021, the latter grew to an official
cross-institutional long-term mentoring program called SIGPLAN-M; in January
2022, the former grew to Computer Architecture Long-term Mentoring (CALM).
The impacts have been strong: SIGPLAN-M reaches 328 mentees and 234 mentors
across 41 countries, and mentees have described it as "life changing" and "a
career saver." And while CALM is in its pilot phase -- with 13 mentors and 21
mentees across 7 countries -- it has received very positive feedback. The
leaders of SIGPLAN-M and CALM shared our designs, impacts, and challenges along
the way. Now, we wish to share those with you. We hope this will kick-start a
larger long-term mentoring effort across all of computer science
Passport: Improving Automated Formal Verification Using Identifiers
Formally verifying system properties is one of the most effective ways of
improving system quality, but its high manual effort requirements often render
it prohibitively expensive. Tools that automate formal verification, by
learning from proof corpora to suggest proofs, have just begun to show their
promise. These tools are effective because of the richness of the data the
proof corpora contain. This richness comes from the stylistic conventions
followed by communities of proof developers, together with the logical systems
beneath proof assistants. However, this richness remains underexploited, with
most work thus far focusing on architecture rather than making the most of the
proof data.
In this paper, we develop Passport, a fully-automated proof-synthesis tool
that systematically explores how to most effectively exploit one aspect of that
proof data: identifiers. Passport enriches a predictive Coq model with three
new encoding mechanisms for identifiers: category vocabulary indexing, subword
sequence modeling, and path elaboration. We compare Passport to three existing
base tools which Passport can enhance: ASTactic, Tac, and Tok. In head-to-head
comparisons, Passport automatically proves 29% more theorems than the
best-performing of these base tools. Combining the three Passport-enhanced
tools automatically proves 38% more theorems than the three base tools
together, without Passport's enhancements. Finally, together, these base tools
and Passport-enhanced tools prove 45% more theorems than the combined base
tools without Passport's enhancements. Overall, our findings suggest that
modeling identifiers can play a significant role in improving proof synthesis,
leading to higher-quality software
Proof Repair
Thesis (Ph.D.)--University of Washington, 2021The days of verifying only toy programs are long gone. The last two decades have marked a new era of verification at scale, bringing strong guarantees to large and critical systems—an era of proof engineering. Proof engineering is for verified systems what software engineering is for unverified systems. Still, while proof engineering—like software engineering—is about both development and maintenance, most proof engineering technologies so far have focused on development. Whenit comes to maintaining these systems, proof engineering is decades behind software engineering. This thesis introduces proof repair: a new approach to maintaining verified systems. Proof repair reimagines the automation proofengineers typically use to interactively guide tools to search for a machine-checked proof. When a system changes and this breaks a proof about the system, traditional automation searches for the fixed proof from scratch. Proof repair, in contrast, is change-aware automation: it determines how the system has changed, and uses that information to help fix the broken proof. Proof repair in this thesis works by combining semantic differencing algorithms with program transformations. Importantly, both differencing and the transformations operate over low-level representations of proofs called proof terms. Thanks to the richness of these proof terms, differencing and the transformations can leverage new and existing results in dependent type theory. For example, one transformation externalizes univalent transport from homotopy type theory, leveraging novel transformations over equalities to make this possible. This approach is realized inside of a proof repair tool suite for the Coq proof assistant. Case studies show both retroactively and by live use that this proof repair tool suite can save work for proof engineers on real proof developments
QED at large: a survey of engineering of formally verified software
This monograph provides the reader with an insightful overview of the work that has led to modern-day techniques for formally verifying software. In times of increasing automation, this underpins many software systems so future trends are also highlighted
White matter hyperintensities correlate to cognition and fiber tract integrity in older adults with HIV
Our aim was to examine the clinical relevance of white matter hyperintensities (WMH) in HIV. We used an automated approach to quantify WMH volume in HIV seropositive (HIV+; n = 65) and HIV seronegative (HIV-; n = 29) adults over age 60. We compared WMH volumes between HIV+ and HIV- groups in cross-sectional and multiple time-point analyses. We also assessed correlations between WMH volumes and cardiovascular, HIV severity, cognitive scores, and diffusion tensor imaging variables. Serostatus groups did not differ in WMH volume, but HIV+ participants had less cerebral white matter (mean: 470.95 [43.24] vs. 497.63 [49.42] mL, p = 0.010). The distribution of WMH volume was skewed in HIV+ with a high proportion (23%) falling above the 95th percentile of WMH volume defined by the HIV- group. Serostatus groups had similar amount of WMH volume growth over time. Total WMH volume directly correlated with measures of hypertension and inversely correlated with measures of global cognition, particularly in executive functioning, and psychomotor speed. Greater WMH volume was associated with poorer brain integrity measured from diffusion tensor imaging (DTI) in the corpus callosum and sagittal stratum. In this group of HIV+ individuals over 60, WMH burden was associated with cardiovascular risk and both worse diffusion MRI and cognition. The median total burden did not differ by serostatus; however, a subset of HIV+ individuals had high WMH burden